We examine the problem of learning a single occurrence regular expression with interleaving (SOIRE) from a set of text strings with noise. SOIRE has unrestricted support for interleaving and covers most of the regular expressions in practice. Learning SOIREs is challenging because it needs heavy computation and text strings usually contains noise in practice. Most of the previous work only learns restricted SOIREs and is not robust on noisy data. To tackle these issues, we proposea noise-tolerant differentiable learning approach SOIREDL for SOIRE. We design a neural network to simulate SOIRE matching of given text strings and theoretically prove that a class of the set of parameters learnt by the neural network, called faithful encoding, is one-to-one corresponding to SOIRE for a bounded size. Based on this correspondence, we interpret the target SOIRE from the set of parameters of the neural network by exploring nearest faithful encodings. Experimental results show that SOIREDL outperforms the state-of-the-art approaches especially on noisy data.
translated by 谷歌翻译
与卷积神经网络(CNN)相比,视觉变压器(VIT)表现出了有希望的性能,但是VIT的训练比CNN难得多。在本文中,我们定义了几个指标,包括动态数据比例(DDP)和知识同化率(KAR),以研究训练过程,并将其分为三个时期:形成,增长和探索。特别是,在训练的最后阶段,我们观察到只有很小的训练示例用于优化模型。鉴于VIT的数据渴望的性质,我们提出了一个简单但重要的问题:在培训的每个阶段,是否有可能提供丰富的``有效''培训示例吗?为了解决这个问题,我们需要解决两个关键问题,即\ ie,如何衡量单个培训示例的``有效性'',以及如何系统地生成足够数量的``有效''示例。为了回答第一个问题,我们发现训练样本的``困难''可以作为衡量培训样本的``有效性''的指标。为了解决第二个问题,我们建议在这些演化阶段动态调整训练数据的``难度''分布。为了实现这两个目的,我们提出了一个新颖的以数据为中心的VIT培训框架,以动态测量训练样本的``难度'',并为不同培训阶段的模型生成``有效的''样品。此外,为了进一步扩大``有效''样品的数量,并减轻了VIT的后期训练阶段的过度拟合问题,我们提出了一种称为Patcherasing的补丁级擦除策略。广泛的实验证明了提出的以数据为中心的VIT培训框架和技术的有效性。
translated by 谷歌翻译
持续学习(CL)依次学习像人类这样的新任务,其目标是实现更好的稳定性(S,记住过去的任务)和可塑性(P,适应新任务)。由于过去的培训数据不可用,因此探索培训示例中S和P的影响差异很有价值,这可能会改善对更好的SP的学习模式。受影响函数的启发(如果),我们首先研究了示例通过添加扰动来示例体重和计算影响推导的影响。为了避免在神经网络中Hessian逆的存储和计算负担,我们提出了一种简单而有效的METASP算法,以模拟IF计算中的两个关键步骤,并获得S-和P-Aware示例的影响。此外,我们建议通过解决双目标优化问题来融合两种示例影响,并获得对SP Pareto最优性的融合影响。融合影响可用于控制模型的更新并优化排练的存储。经验结果表明,我们的算法在任务和类别基准CL数据集上都显着优于最先进的方法。
translated by 谷歌翻译
脑膜瘤等级的术前和非侵入性预测在临床实践中很重要,因为它直接影响临床决策。更重要的是,脑膜瘤中的大脑侵袭(即,在相邻脑组织中存在肿瘤组织)是脑膜瘤分级的独立标准,并影响了治疗策略。尽管据报道已经努力解决这两个任务,但其中大多数依赖于手工制作的功能,并且没有尝试同时利用这两个预测任务。在本文中,我们提出了一种新型的任务意识到的对比学习算法,以共同预测来自多模式MRI的脑膜瘤等级和脑部侵袭。基于基本的多任务学习框架,我们的关键思想是采用对比度学习策略,以将图像功能分解为特定于任务的功能和任务遵守功能,并明确利用其固有的连接以改善两个预测任务的功能表示形式。在这项回顾性研究中,收集了一个MRI数据集,通过病理分析,有800名患者(含有148个高级,62名侵袭)患有脑膜瘤。实验结果表明,所提出的算法的表现优于替代性多任务学习方法,其AUCS分别为0:8870和0:9787,分别用于预测脑膜瘤等级和脑部侵袭。该代码可在https://github.com/isdling/predicttcl上找到。
translated by 谷歌翻译
多模式MR成像通常用于临床实践中,以通过提供丰富的互补信息来诊断和研究脑肿瘤。以前的多模式MRI分割方法通常通过在网络的早期/中阶段连接多模式MRIS来执行模态融合,这几乎无法探索模态之间的非线性依赖性。在这项工作中,我们提出了一种新型的嵌套模态感知变压器(嵌套形式),以明确探索多模式MRIS在脑肿瘤分割中的模式内和模式间关系。我们建立在基于变压器的多模型和单一码头结构的基础上,我们对不同模式的高级表示进行嵌套的多模式融合,并在较低的尺度上应用对模态敏感的门控(MSG),以进行更有效的跳过连接。具体而言,多模式融合是在我们提出的嵌套模态感知特征聚合(NMAFA)模块中进行的,该模块通过三个方向的空间意见变压器增强了单个模态内的长期依赖性,并进一步补充了模态信息之间的关键情境信息。通过跨模式注意变压器。关于BRATS2020基准和私人脑膜瘤细分(Maniseg)数据集的广泛实验表明,嵌套形式显然比最先进的表现优于最先进的。该代码可从https://github.com/920232796/nestedformer获得。
translated by 谷歌翻译
Machine learning methods have revolutionized the discovery process of new molecules and materials. However, the intensive training process of neural networks for molecules with ever-increasing complexity has resulted in exponential growth in computation cost, leading to long simulation time and high energy consumption. Photonic chip technology offers an alternative platform for implementing neural networks with faster data processing and lower energy usage compared to digital computers. Photonics technology is naturally capable of implementing complex-valued neural networks at no additional hardware cost. Here, we demonstrate the capability of photonic neural networks for predicting the quantum mechanical properties of molecules. To the best of our knowledge, this work is the first to harness photonic technology for machine learning applications in computational chemistry and molecular sciences, such as drug discovery and materials design. We further show that multiple properties can be learned simultaneously in a photonic chip via a multi-task regression learning algorithm, which is also the first of its kind as well, as most previous works focus on implementing a network in the classification task.
translated by 谷歌翻译
现有的离线增强学习(RL)方法面临一些主要挑战,尤其是学识渊博的政策与行为政策之间的分配转变。离线Meta-RL正在成为应对这些挑战的一种有前途的方法,旨在从一系列任务中学习信息丰富的元基础。然而,如我们的实证研究所示,离线元RL在具有良好数据集质量的任务上的单个任务RL方法可能胜过,这表明必须在“探索”不合时宜的情况下进行精细的平衡。通过遵循元元素和“利用”离线数据集的分配状态行为,保持靠近行为策略。通过这种经验分析的激励,我们探索了基于模型的离线元RL,并具有正则政策优化(MERPO),该策略优化(MERPO)学习了一种用于有效任务结构推理的元模型,并提供了提供信息的元元素,以安全地探索过分分布状态 - 行为。特别是,我们使用保守的政策评估和正规政策改进,设计了一种新的基于元指数的基于元指数的基于元模型的参与者批判性(RAC),作为MERPO的关键构建块作为Merpo的关键构建块;而其中的内在权衡是通过在两个正规机构之间达到正确的平衡来实现的,一个是基于行为政策,另一个基于元政策。从理论上讲,我们学识渊博的政策可以保证对行为政策和元政策都有保证的改进,从而确保通过离线元RL对新任务的绩效提高。实验证实了Merpo优于现有的离线META-RL方法的出色性能。
translated by 谷歌翻译
去耦时尚表示是指将空间和时间特征分解成尺寸无关的因素。尽管以前的基于RGB-D的运动识别方法通过紧密耦合的多模态时空表示来实现了有希望的性能,但由于紧密的时空缠绕的建模,它们仍然在小数据设置下遭受(i)优化困难;(ii)信息冗余通常包含与分类弱相关的大量边际信息; (iii)由晚期融合不足引起的多模态起峰型信息之间的低相互作用。为了缓解这些缺点,我们建议去除并循环基于RGB-D的运动识别的时空表示。具体而言,我们解开了学习时空表示的任务到3个子任务:(1)通过解耦的空间和时间建模网络学习高质量和维度独立特征。 (2)重新汇总解耦表示,以确定更强的时空依赖。 (3)引入跨型自适应后融合(CAPF)机制,用于从RGB-D数据捕获跨模态时空信息。这些新颖设计的无缝组合形成了强大的时空表示,而不是在四个公共运动数据集上的最先进的方法实现了更好的性能。我们的代码可在https://github.com/damo-cv/motionrgbd获得。
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
In this paper we explore the task of modeling (semi) structured object sequences; in particular we focus our attention on the problem of developing a structure-aware input representation for such sequences. In such sequences, we assume that each structured object is represented by a set of key-value pairs which encode the attributes of the structured object. Given a universe of keys, a sequence of structured objects can then be viewed as an evolution of the values for each key, over time. We encode and construct a sequential representation using the values for a particular key (Temporal Value Modeling - TVM) and then self-attend over the set of key-conditioned value sequences to a create a representation of the structured object sequence (Key Aggregation - KA). We pre-train and fine-tune the two components independently and present an innovative training schedule that interleaves the training of both modules with shared attention heads. We find that this iterative two part-training results in better performance than a unified network with hierarchical encoding as well as over, other methods that use a {\em record-view} representation of the sequence \cite{de2021transformers4rec} or a simple {\em flattened} representation of the sequence. We conduct experiments using real-world data to demonstrate the advantage of interleaving TVM-KA on multiple tasks and detailed ablation studies motivating our modeling choices. We find that our approach performs better than flattening sequence objects and also allows us to operate on significantly larger sequences than existing methods.
translated by 谷歌翻译